The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
视频问题回答(VideoQA)是一项复杂的任务,需要多种模式数据进行培训。但是,对视频的问题和答案的手动注释是乏味的,禁止可扩展性。为了解决这个问题,最近的方法考虑了零拍设置,而无需手动注释视觉问题。特别是,一种有前途的方法调整了在网络级文本数据中预测的冻结自回归语言模型,以适应多模式输入。相比之下,我们在这里建立在冷冻双向语言模型(BILM)的基础上,并表明这种方法为零拍出的VideoQA提供了更强大,更便宜的替代方案。特别是(i)我们使用轻型训练模块将视觉输入与冷冻的BILM结合在一起,(ii)我们使用Web-Scrafe Multi-Mododal数据训练此类模块,最后(iii)我们通过掩盖语言执行零声录像带推断建模,其中蒙版文本是给定问题的答案。我们提出的方法Frozenbilm在零摄影的视频中的表现优于最高的,包括LSMDC-FIB,包括LSMDC-FIB,IVQA,MSRVTT-QA,MSVD-QA,ActivityNet-QA,TGIF-FRAMEQA,TGIF-FRAMEQA,,TGIF-FRAMEQA,,TGIF-FRAMEQA,,,MSRVTT-QA,MSRVTT-QA,MSRVTT-QA,MSRVTT-QA,MSRVTT-QA,,均优于最新技术。 How2QA和TVQA。它还在几次且完全监督的环境中展示了竞争性能。我们的代码和模型将在https://antoyang.github.io/frozenbilm.html上公开提供。
translated by 谷歌翻译
我们考虑在与给定文本查询相对应的视频中定位时空管的问题。这是一项具有挑战性的任务,需要对时间,空间和多模式相互作用进行联合有效的建模。为了解决此任务,我们提出了TubedEtr,这是一种基于变压器的体系结构,灵感来自此类模型在文本条件条件的对象检测中的最新成功。我们的模型特别包括:(i)有效的视频和文本编码器,该视频和文本编码器对稀疏采样帧进行了空间多模式相互作用,以及(ii)共同执行时空定位的时空解码器。我们通过广泛的消融研究证明了我们提出的组件的优势。我们还在时空视频接地任务上评估了我们的完整方法,并在具有挑战性的VIDSTG和HC-STVG基准方面证明了对最新技术的改进。代码和训练有素的模型可在https://antoyang.github.io/tubedetr.html上公开获得。
translated by 谷歌翻译
我们介绍了一种组合变分AutiCencoders(VAE)和深度度量学习的方法,以通过高维和结构化输入空间执行贝叶斯优化(BO)。通过从深度度量学习中调整思路,我们使用BlackBox功能的标签指导来构建VAE潜在空间,促进高斯工艺拟合并产生改善的BO性能。重要的是,对于BO问题设置,我们的方法在半监督的制度中运行,其中只有少数标记的数据点。我们在三个现实世界任务中运行实验,在惩罚的LOGP分子生成基准上实现最先进的结果,只使用先前方法所需的标记数据的3%。作为一种理论贡献,我们提出了vae bo遗憾的证据。
translated by 谷歌翻译
Reading comprehension of legal text can be a particularly challenging task due to the length and complexity of legal clauses and a shortage of expert-annotated datasets. To address this challenge, we introduce the Merger Agreement Understanding Dataset (MAUD), an expert-annotated reading comprehension dataset based on the American Bar Association's 2021 Public Target Deal Points Study, with over 39,000 examples and over 47,000 total annotations. Our fine-tuned Transformer baselines show promising results, with models performing well above random on most questions. However, on a large subset of questions, there is still room for significant improvement. As the only expert-annotated merger agreement dataset, MAUD is valuable as a benchmark for both the legal profession and the NLP community.
translated by 谷歌翻译
The increasingly widespread adoption of large language models has highlighted the need for improving their explainability. We present context length probing, a novel explanation technique for causal language models, based on tracking the predictions of a model as a function of the length of available context, and allowing to assign differential importance scores to different contexts. The technique is model-agnostic and does not rely on access to model internals beyond computing token-level probabilities. We apply context length probing to large pre-trained language models and offer some initial analyses and insights, including the potential for studying long-range dependencies. The source code and a demo of the method are available.
translated by 谷歌翻译
Over the past decade, neural networks have been successful at making predictions from biological sequences, especially in the context of regulatory genomics. As in other fields of deep learning, tools have been devised to extract features such as sequence motifs that can explain the predictions made by a trained network. Here we intend to go beyond explainable machine learning and introduce SEISM, a selective inference procedure to test the association between these extracted features and the predicted phenotype. In particular, we discuss how training a one-layer convolutional network is formally equivalent to selecting motifs maximizing some association score. We adapt existing sampling-based selective inference procedures by quantizing this selection over an infinite set to a large but finite grid. Finally, we show that sampling under a specific choice of parameters is sufficient to characterize the composite null hypothesis typically used for selective inference-a result that goes well beyond our particular framework. We illustrate the behavior of our method in terms of calibration, power and speed and discuss its power/speed trade-off with a simpler data-split strategy. SEISM paves the way to an easier analysis of neural networks used in regulatory genomics, and to more powerful methods for genome wide association studies (GWAS).
translated by 谷歌翻译
Recent methods demonstrate that data augmentation using counterfactual knowledge can teach models the causal structure of a task, leading to robust and generalizable models. However, such counterfactual data often has a limited scale and diversity if crowdsourced and is computationally expensive to extend to new perturbation types if generated using supervised methods. To address this, we introduce a new framework called DISCO for automatically generating high-quality counterfactual data at scale. DISCO engineers prompts to generate phrasal perturbations with a large general language model. Then, a task-specific teacher model filters the generation to distill high-quality counterfactual data. We show that learning with this counterfactual data yields a comparatively small student model that is 6% (absolute) more robust and generalizes 5% better across distributions than baselines on various challenging evaluations. This model is also 15% more sensitive in differentiating original and counterfactual examples, on three evaluation sets written by human workers and via human-AI collaboration.
translated by 谷歌翻译
We study inductive matrix completion (matrix completion with side information) under an i.i.d. subgaussian noise assumption at a low noise regime, with uniform sampling of the entries. We obtain for the first time generalization bounds with the following three properties: (1) they scale like the standard deviation of the noise and in particular approach zero in the exact recovery case; (2) even in the presence of noise, they converge to zero when the sample size approaches infinity; and (3) for a fixed dimension of the side information, they only have a logarithmic dependence on the size of the matrix. Differently from many works in approximate recovery, we present results both for bounded Lipschitz losses and for the absolute loss, with the latter relying on Talagrand-type inequalities. The proofs create a bridge between two approaches to the theoretical analysis of matrix completion, since they consist in a combination of techniques from both the exact recovery literature and the approximate recovery literature.
translated by 谷歌翻译